20 research outputs found
Efficient Learning of Mesh-Based Physical Simulation with BSMS-GNN
Learning the physical simulation on large-scale meshes with flat Graph Neural
Networks (GNNs) and stacking Message Passings (MPs) is challenging due to the
scaling complexity w.r.t. the number of nodes and over-smoothing. There has
been growing interest in the community to introduce \textit{multi-scale}
structures to GNNs for physical simulation. However, current state-of-the-art
methods are limited by their reliance on the labor-intensive drawing of coarser
meshes or building coarser levels based on spatial proximity, which can
introduce wrong edges across geometry boundaries. Inspired by the bipartite
graph determination, we propose a novel pooling strategy, \textit{bi-stride} to
tackle the aforementioned limitations. Bi-stride pools nodes on every other
frontier of the breadth-first search (BFS), without the need for the manual
drawing of coarser meshes and avoiding the wrong edges by spatial proximity.
Additionally, it enables a one-MP scheme per level and non-parametrized pooling
and unpooling by interpolations, resembling U-Nets, which significantly reduces
computational costs. Experiments show that the proposed framework,
\textit{BSMS-GNN}, significantly outperforms existing methods in terms of both
accuracy and computational efficiency in representative physical simulations.Comment: Updates summary: * update to the accepted version ICM
AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control
Neural implicit fields are powerful for representing 3D scenes and generating
high-quality novel views, but it remains challenging to use such implicit
representations for creating a 3D human avatar with a specific identity and
artistic style that can be easily animated. Our proposed method, AvatarCraft,
addresses this challenge by using diffusion models to guide the learning of
geometry and texture for a neural avatar based on a single text prompt. We
carefully design the optimization framework of neural implicit fields,
including a coarse-to-fine multi-bounding box training strategy, shape
regularization, and diffusion-based constraints, to produce high-quality
geometry and texture. Additionally, we make the human avatar animatable by
deforming the neural implicit field with an explicit warping field that maps
the target human mesh to a template human mesh, both represented using
parametric human models. This simplifies animation and reshaping of the
generated avatar by controlling pose and shape parameters. Extensive
experiments on various text descriptions show that AvatarCraft is effective and
robust in creating human avatars and rendering novel views, poses, and shapes.
Our project page is: https://avatar-craft.github.io/.Comment: ICCV 2023 Camera Read
Quantized GAN for Complex Music Generation from Dance Videos
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal
framework that generates complex musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input,
and learns to generate music samples that plausibly accompany the corresponding
input. Unlike most existing conditional music generation works that generate
specific types of mono-instrumental sounds using symbolic audio representations
(e.g., MIDI), and that heavily rely on pre-defined musical synthesizers, in
this work we generate dance music in complex styles (e.g., pop, breakdancing,
etc.) by employing a Vector Quantized (VQ) audio representation, and leverage
both its generality and the high abstraction capacity of its symbolic and
continuous counterparts. By performing an extensive set of experiments on
multiple datasets, and following a comprehensive evaluation protocol, we assess
the generative quality of our approach against several alternatives. The
quantitative results, which measure the music consistency, beats
correspondence, and music diversity, clearly demonstrate the effectiveness of
our proposed method. Last but not least, we curate a challenging dance-music
dataset of in-the-wild TikTok videos, which we use to further demonstrate the
efficacy of our approach in real-world applications - and which we hope to
serve as a starting point for relevant future research
3DAvatarGAN: Bridging Domains for Personalized Editable Avatars
Modern 3D-GANs synthesize geometry and texture by training on large-scale
datasets with a consistent structure. Training such models on stylized,
artistic data, with often unknown, highly variable geometry, and camera
information has not yet been shown possible. Can we train a 3D GAN on such
artistic data, while maintaining multi-view consistency and texture quality? To
this end, we propose an adaptation framework, where the source domain is a
pre-trained 3D-GAN, while the target domain is a 2D-GAN trained on artistic
datasets. We then distill the knowledge from a 2D generator to the source 3D
generator. To do that, we first propose an optimization-based method to align
the distributions of camera parameters across domains. Second, we propose
regularizations necessary to learn high-quality texture, while avoiding
degenerate geometric solutions, such as flat shapes. Third, we show a
deformation-based technique for modeling exaggerated geometry of artistic
domains, enabling -- as a byproduct -- personalized geometric editing. Finally,
we propose a novel inversion method for 3D-GANs linking the latent spaces of
the source and the target domains. Our contributions -- for the first time --
allow for the generation, editing, and animation of personalized artistic 3D
avatars on artistic datasets.Comment: Project Page: https://rameenabdal.github.io/3DAvatarGAN
InfiniCity: Infinite-Scale City Synthesis
Toward infinite-scale 3D city synthesis, we propose a novel framework,
InfiniCity, which constructs and renders an unconstrainedly large and
3D-grounded environment from random noises. InfiniCity decomposes the seemingly
impractical task into three feasible modules, taking advantage of both 2D and
3D data. First, an infinite-pixel image synthesis module generates
arbitrary-scale 2D maps from the bird's-eye view. Next, an octree-based voxel
completion module lifts the generated 2D map to 3D octrees. Finally, a
voxel-based neural rendering module texturizes the voxels and renders 2D
images. InfiniCity can thus synthesize arbitrary-scale and traversable 3D city
environments, and allow flexible and interactive editing from users. We
quantitatively and qualitatively demonstrate the efficacy of the proposed
framework. Project page: https://hubert0527.github.io/infinicity